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A Method for CompUioe Higth-level Language Ptosams to a Kecojofigurafalg nara-flot^ ftocesgor 2 " " 

1 Intnxiaction 

Hiis documeat describes a ineifaod for compilmg a subset pf a high4efWfcl pfogramnung langoage (HU,) 
like C or FORTRAN, exiendBd by pott access functions, to a reconfigoraWe data-flow processor (RDFP) 
as described in Secdon 3. The pmsiam is transfonned to a configuradoa of the KDFP, 

This method can be used as pan of aa eoeoded compiler for a hybrid arcbiEeeture con$isiiog of standard 
host procesKir and a (ceonfiguiable daia-flow coprocessor. The extcqoded compiler handles a ruil HLL 
UkB standajd ANSI C. li maps suitable progjam pares like inner loa[)s va me coprocessor and (be i«3c 
of die program 10 the host jwocessoc It is also possible to map sqerae program pans to separate 
coofigBTaiions. However, these eaiBiBdonsaraiwi subject of this docament. ^ 1 a 

IjWj «*K.ft»3BCe/^ wtSj^iM tUf^^UaU^rt. tt^Mtsl pemn Mfy ^m^e^tlUA 

2 CominlatiosiFlow'' ' ^ 

This secdon briefly describes the phases of die coinpiladon mcdrad. 
2.1 Frontoid 

The canpiler uses a standaid fionieod which innidates the input program (e. g. a C program) into an in- 
ternal fbtmat consisting of an abstract :^tw nee iAST) and symbol lables. The froniend also peifomis 
weU-taiown compiler opihnizations as constant propagadoa, dead code cUojinadoo, common sabexpres- 
SionehminatiOD etc. For deiafls. refer to any compile coKHiuciion lexUJoolc lite (11. The SD3F compiler 
[ZjisanexampleofaconipnerprovlAigsachsftonOend. 

ZZ Omtnl/Daiaflow Gtaph Generaiion 

New, the pcogiam is mapped id a conirolAiatafIo«r graph (CDFG) consisting of connected RDFP fane- 
tiona. TTiis phase Is die main subject of fliis doomient and pn»ented hi Section 4. 

23 Configoration Code CreneratiOD 

V^^'^^n^^^ configmarion code used to progiam die RDFP. For 

PACT XPPw Cores, die configuraaon code is geoeiaied as an NML (Native M^qpping Language) life 

3 Con%iirable Objects and Functitmallty of a RDFP 

3^.*^™'*'*?*** U«cpnfiguraWe objects and fiincttionality of aKDFP. A possible implementation 

a RDFP for th.s compdauon mediod to work. The only data types considered ae mulU-Wt words called 
^ pnd smge-bii control signals caUcd cents. Data and events a« always pnxessed aspadhT^ 
Secoon32. Event packets am cafled 1-events or Ch^^s. depending on tfae^K^Ser 
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3L1 Con^uiabJe Objects and FoDciSoKis 

An RJRFF cpiwt; of an an^ of configusable objects and a eommunicadon acDvoric Each ofcject can 
be coafiguxed co pcifonn rarain fancdw Qisce^ bdow). Ii peribnns the same funcdon lepeacedly ontil 
tiie conSguratian is dianged. The amy n^sds not be coraplexdy onifonn. L not all objeccs need co be 
able to pcrfonn ail fimcdons. E. g.. a RAM fimcdon can be impkmeixted by a specialized RAM otg^rl 
-whleb carniOt peffonn any otber fonciiops. It is also possible id oorobSne several objects co a "inacro** lo 
lealizB ceitain funcdons. Several RAM objects can, e.^ , be combined toieaJize a RAM fwciioa whh 
laiS^sioiage. 
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Figure 1: Functiorils of ao RDFP 

The followins fundi0ns for processing daia ond event pockets can be coniisuted into an RDFE SceHg. 1 
for a gy^phical xzyi e&eniaKion. 

• ALU[QpcodB]: ALUs perfbim common arichmetienl and logical operations on cjaia. ALU func- 
tions C^opcodea") must be available for all operations used in tbe HU..^ ALU functions have cwo 
data inputs A and B, and one data ou^iot X. Comparators have an evenl output U instead of die 
data output- They pnxiuce a l-event if the cornpansoo is mie, and aO-«vent otherwise* 



'OdiCTitfUC picgiams csmaining opejE^uis tvbich Co aos have ALU qiccds io the RDFP fflusc exuded from tbe 
svppanctl HLL subset or sot»ifEaicd by " aa c ro aT «r exhdng nncaon*. 
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• CNT: A eounier fbnedcm which Im data inputs LB, UB and INC Oower bound, upi^r boood 
and mcmmeaO and datji wijit X (counter value). A packet at evint faput^l^wS 
counter; and Bveni nput NEXTcauses the g«,«dca of U.e pi^ output vair(and^i^te) 

contnjuoudy. -me output ev«ts U. V, and W have th« followins fcncdSiy: ^^c^ 
countmg Njjnes, N-I O^enis and one l-event aie generated at outputU Aiouam V -N 

ci«ted after the counter has tenninated. i. a a 
paciQA was ompuL 

• RAM[sl»]: The RAM function ssm a fixed Dumber of da» wwds C^ze-), K bas a data input 
RD data o«put OUT for reading at address RD, Event output isD Jg^^SSX 
SS?^ Fora v^^cess. data inputs WR and XN-Caddiesa andvaj^and dL outej 
OUTs,»ed. Event output BWRsifina}s completion of the v.rite access. ERDandEW^S! 
^t. O^tu No,« that ««an«I RAM can be handled « RAM funcaons e««W S 

• f •"^IT'T' * ^ "P"* A back and an event padset at input E. When 
ro«^. -nie data p^ is copied to output X. and the 

• ^™^=^MUXfilnc^o^^»s2dalaiiq,ulsAandB.a^<^'cnt^npa^SEL,and^ If 

„put A is copied to oucput X. but input B k «»r discanledL TheSdw 
IS left at the input B instead. Wor a 1 -«vem. B is copied and A left « die input 

• '^^^^LiJ.^^^^ "^^^^ A. an event input SEL. and two daia outpum X 
Y F^'. 1 SLTT'"' ' ^ " output X. and no packet is cfented at^put 
Y. For a l-event, A is copied to Y, and no packet is oeated at output X. 

. NTOATA: A MDaTA function muliipRcates data packets. It has a data input A. an event input 

to^t XL !^ all sul^equent O^eni at SEL. a copy of the input data packet Is produced 
0^ wi Jout consuramg new p^fcey. « A. Only If another 1 ^t enives at SBU d« next daS 
packet at A IS consumed and copied.'^ 

* ^^'^^ ^ ^ «&"ush input pcBt -name- and 

t« ^ ^ * i'^**^ - O^nt is prSoc^ at'^enT^ut^ 

tec (Note d,ardusfimctIoncan only be conSgumiat special objects connected ta«^ bLes.) 

SiL^l ; ^'^ ^ " ^"^^ ^ too. (Note ttel 

dusfuncuoocanonlybeconfiguredacspecialobieasconnectedtoexten^ 

Addiuonally. the foU«wing functjoas mam pulate only event packets: 
Wdwthis can baiB^jljancnwd by a MER(3E with special pjqpuifesoflXPpn*. 



Fmn-P 4. •no /in /nnnn i * .1- 
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m 0-FILTER, 1-FILTER: A FILTER has an input E and an output U. A O-FIITER copies a (Veveni 
iiPosi£coU,bot l-EV£^^^satEaIedi&caIded. A2«I^TERcop^ 

« INVERTER: Coines all eveixis from inpac E to Qurput U but jnveits iES valuix 

• CW:oNSTA>rr.l-c6NSTANT: 0-CONSTANT copies all cvcnb fiom inpui E to Oucpui U, tol 
changes them' all Co value 0. 1 -CONSTANT changes all co value I . 

m ECOMB: Combines two or mone inpucs EI, E2. E3..,, producio^ a packet at output U. The output 
15 a 1-cvcnt if and only if one or moje of ifae laput packets an? I -evens (logical or). A padu must 
be avail^Ie ac aH inputs before an caput padoet is produced.^ 

• ESEQJseq]: An ESEQ generaxes a sequence *^cq'' of events, c^g. "tXX>r, at its onq)ut IT. If it 
ha» as inpuc START, one eqcxn: sequence h ^enled for each e^eni packet arriving at U. The 
sequence is only repeated if che next event arrives at U. However* if START Is not connected. 
ESEQ constantly repeats the sequence. 



Note tbai ALU MUX DEMXJX, GATE and ECOMB funccioos behave Ukc their equhralems in classical 
. dataflow machines (3, 4]. 



32 Faclset-based Comnnimcatiion Network 



The communicacion necworfc of an RDFFcan connect an outputs of one object (Le. its rapeciive fnnc* 
xion) to the inputCs) of one or several odier oljects. TWs ususdiy achieved by buSs*^ and switches. By 
placing ihe functions properly on flic objects* many funaions can be connecccd arbitrarily up to a liznit 
inqwsed by the device size. As mentioned above, 9JI values are communicated as packseis. A separate 
communicHtion network e^xins for data and event packets- The padkets synchronize (he functions as in a 
dataflow machine with acknowledge [3J. I. e,, the ftmction only executes when all input pacJceis an? avail- 
able (apaix from The non-strict excepdons as described above). The funcuon also stalls if the last output 
pactec has not been consumed. Therefore a data-flow graph mapped to an RDEP self-synchroaizBs its 
eRecucion without the need for exiemal control. Only if Cwq or more funcuon outputs (data or event) arc 
connected to the same function input C*N to i connection"), the seIf-5ynchxoni»uon b di^abled.^ The 
user has to ensure that only one pactei lamvcs at a time in a coneci CDFG, Otherwise a packet might 
get lost^ and the value resulting from apmbining two or more packers is undefined. However, a function 
omput can be connected to many function inputs C*l to N connection") without problems. 
There are some special cases: 



» A functfon input can be prelcaded with a distinct value dtiring cDufigaiaiion. This paclcet is con- 
sumed like a normal packet coming from another otject, 

• A function input can be delxned as ctroamr. In ilds case, the packet at ihe input is tcpreduoed 
repeatedly for each funcuon e^cecudon. 

^Noie ihai thfe ftinctioa is implemenfcd 6y {he EAND opcnzor on the XPF*^ . 

•^Npt^ ihui on XPPTM Cox^. « TT to 1 coimeedon^ for e^ctns is rBolisnJ hyO^EOR Ivncdea, and fea-da»by jwst vslgmns 

scvonloaipuotowwpus. 
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An RDEP register ddays in cbe dataflow. Otherwise v«y ton- combinational delays and aswi- 

chronous feedlra* i, pos»iM<!u a$sun>c diat delays are insetted at the mpoxs of some iuncrdons (like 
formosi AI^s) and « some rouDDgsegraems of the commiinicaiion nttwoflc Nobs ftacrcgbien chaaee 

4 Configuration GeDeration 
4.1 Language Definifion 

The following HLL feanires are net supposEted by ihe mefiiod described here: 

• peinttT operadons 

• KhBoy caUs, opexaiing system calls (including siandaid VO fhncilons) 

. recoeivc fimction «1|5 (NoK tbax aon^ecnrsive fiaiction calls can be eliminated by funcdon m- 
limng and ibeiefore are not considered here.) «V luncaon m- 

• ^ ^5 "SLSTf !° we integer, imager values are equivalent to packers in 
die RDPP. Anays (possibly miUft-dmiensional) are ihe only composiie data types considered. 

The following additional features are supported: 

4^ Mapping of Bigh-l^yid Language Constructs 

TtemeftodconvensaHLLprogramtoaCDRS consisting of the I® 

l^^^ JT^'ffi ^ HU. P-iram an^ are n,a««d to RDFP RAM functiotS: A^^^y 
^^I^ "^"^^^ ""'''^ CDFG.TT«re must be enough RAMs of sufficieni 4ef«aM 

mentbystat^«.tandd2cndsin.»d,e,o"r^^^ 

two pieces of infcwnaUo» ar« «pdau«i at every p«,gnm. point^ Si^ *e ^ 

^ progran. (»ecuaon reaches this prog™, point At the begin^ng, a (KTONSTANT prelo^ 

I^JS'^^L'T' ™« ««« the overall pijgram execution, 

3-„1/^^ pnerawd after a program pan has finished execudng is used as new START 

^gnalfbrthefonowingprogamnan. ^i,^^, ,enninarfon of the S'^^, 
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evetjis guarantee toi the execuiion order of Ae ori^nal program Js mainiained wherever ihe data 
dependencies alone aic doi suiBciaat TMs scheduling scheme is similar lo a one-hot comrvUer 
for digital hardware. 

9 VAFUST is a yisi of {variable. /un^:^ion'PU1pla} pairs- The pairs map in^^er variables or anay 
dcmenus lo a.CDFG flancdon^s oucpuL The first pair for a variable in VaEUST crararis the 
wtpttt cf ibe function which produces die value of dris variable valid at the current jOTgram point. 
New paiis are always added to ihe front of VARLlST. The expression VARDEF(var) refers lo the 
fimKihn-ouiput of the lira pair widi variable var in VAWJST,® 

The following subsections systcmadcaDy list all HLL prograin components and describe how they aie 
processed, thei^ altering die CDFG. START and VARLBT^ . .. 

42ol Integer Sxpresshms and AS^dgnments 

Straighi-line code without array accesses can be dirccUy mapped to a data-Bow graph. One ALU ia 
allocaied for each operatxar in the ^giam. Becasse of the sdf-syndiranlzajdan of the ALUs, no explicit 
control or scheduling is needed. Therefore processing these assignments does not access or alter START, 
The data dependences (as they would be exposed in the DAG represensatton of die program [1]) are 
analyzed dnougb the pioces^ng of VASUST. These assignmexrcs s^yndironize Oiemselves Otrou^ the 
data-flow, 'lite data-driven exeeudon aaramancally exploits the available instmcdon level paxalleUsm. 

AD assignments evaluate Sie rS^t-band side (RHS) or source expression^ This evaluation results in a 
pointer lo a CDFG object*s output (or pseudo-object as defined below). For integer assignments, tfce 
l^trhand side (LHS) variable or destination is combined with the RHS result otgect lo form a new pair 
{LHS. resalt(RHS)} whidk is added to the front of VARUST. 

Ibe simplest statement b a constant assigned to att uiieger^ 

a s 5; 

H doesn't change the CDPG, but adds {a. 5} to die from of VARLIST. The constant 5 is a ^^seudo- 
object" whic* only holds Ihe value, but does not refer to a CDFG objecL Mow VARDEF^a) equals 5 al 
subseqent progrm points before a is zedeSned. 

Integer assignments can also combine variables already defined and constanis: 
b m a » 2 + 3; 

b the AST, ihe RHS is already converted to an expression irec. This tree is iransformcd to a combin^on 
of old and new CDFG objects (which are added to the CDFG) as follows: Each opeiaxor (internal node) 
of die tree is suhstimied by an ALU with the opcode corresponding to the operator in the tree. If a leaf 
node is a constant, the ALU's input is directly conneoed to ihax constanL If a leaf note is an integer 
variable var, it is looked up in VARLIST, i. e. VARDEF(var) is retrieved. Then VARDEF(var) (an output 
of an already existing object in CDFG or a constant) is connected to the ALU's inpuL The output of dje 
ALU corresponding to die root operator in die expression tree is defined as the re^uU of the RHS. Finally, 
a new pair {LHS, resultCEUiS)} is added to VARLIST. If the two assignments above are processed, the 

"This method of using a VaRLIST actopted ftorn TiansmogHfior C compilcf 151, 
^Ncte thai we use C symax for to following aciniples. 
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CDFG with two ALUs in Fis* 2 is created.* Oiupuis occoning in VARLIST are labeled by Roman 
numbers. After these wo assigninenis, VaRLIST = [(b, I}, {a^ 5}]. (The firont of the fei is on the left 
side.) Note chat all inputs connecccd co a constant (whecher direct firom the expxessioD tree or j^tiieyed 
fn?ni\ABLI$D must be defined as constant bipuis defined as coratants hav? a small c next to Ore input 
am)winFsg.2. 

4v2i2 ConditioDallnte^AssigiuneDts 

For condidooal iF-then-else siatemetics concaining onfy integer assignments, objects for condition eval- 
uauon are created first. The cd>ject event output indlciating the condidon result is kept for choosing 
the eonect branch jesult later. Next, both branches arc processed in j^rallel. using separate copies 
VARUSTl and VARLISTZ of VARLIST! (VARLBT itself is not dian^) Finally, for all variables 
added to VAKLISTI or VARUST2, a new enoy for VaRUST is created (corabijiation phase). "Hie valid 
definitions &om VARUSTl and YARUSTZ are combined with a MUX funcdon. and the con«c^ input 
IS selcded by the condidon reside For variables only defined in one of the dvo bnmdxes, die mnldplexer 
uses die result retrieved firoi the originaj VAl^^IST for die other branch. If the original VARJLIST does 
not have an enizy for this variable, a special ^"undefined" constant value is us^. Hcrwever, in a funcdtni- 
ally coixect program this value wOl never be Qsed. As an optmuzadon, only variables Ijii^e (1] after the 
f f-lhen-else stractnns need to be added to VaRLIST in the combinatSon phase.' 

Coosider the following example: 

i » 7f . 
a = 3; 

if ti < 10) [ 
a - 5; 
c » 7; 

else ( 

c = a - I,- 

6 ^ Q; 

} 

Fig. 3 shows Uw resulting CDFG. Before the if-Uien-else cocBCnjct, VaRUST = [{a, 3}, {i, 7}]. After 
processir^ the branches, for the then branch, VARUSTl = [{c 7}, {a. 5}, {a, 3}, {L 7}]. and for die 
else bnjnch. VARUST2 = [{d, d), {c, 1}, (a. 3}. {i. 7}]. After corobSnaUon, varUST = [{d, JO}, (c; 
Iir),{a,IV>,{a,J},{i,7}], 

Note that case- or switch-siaceroetus can be processed^ too. since they caii - without loss of genetali^ - 
be convened lo nested if^-thcn-else statements. 

Ftocessing condidonal statements diis way does not require ttpHcii control and does not change START. 
Both branches as^ executed in parallel and synchroniz^i by the daQ'-fiow, It is possible to pipeline the 
dataflow for opcxroal throughput. 

Nose thai the inpat and output namca can be deduced from ihcir po$iU9D» Fig. J . Abo note that irtc compiler front* 

end wQuia xiOTumMy have ^ubMUuicJ Use tfocona sssignmou bj b « 13 (cotisiant piopaeazian). Far «Iie stcnplicity of Ulis 
npIsnoUoo. no fnmiend opj&tmzaxScm are coo^dcicd in this and ehe fellowing exansfits, 

'Dcfiradon: A y;meblB is ffvr ar a program point if iw value b read ai 9 stetcmcDt lawhable fiom here without insenmediSKe 
To^unicion. 



06-DEZ-2002 14: 5B EffT. -flNW. P. PIETRUK +49 721 469308 5.15 

A Metfaotf /qr Compiting Higb-Leira Language Progrsms to a itecoJiggiirabJe Pata-^Hoty Pnxessor 9 
4u23 General Condition^ Scatemmts 

Cbndiiional staceme&cs concauiiDg either airay accesses Ccf. Section 4,2.7 below) or inner loops cannor 
be processed as described in Section 4.22- Daia pacteis must only be srai lo the active bmnch. Th» is 
achieved by ihe unplemeniaUcn shown in similar to the meihod presented in [4]. 

A daj^ow analysis is performed CO compute used sets use and defined sets de£ [I] of botb bfandies.*® 
For the current VaKUST entries of all variables in IN = usa[tkmbody) U def{tkenbody) U 
use{el$ebody) U tfe/(e?^c6ody) U we(heatfer), DEMUX fbnciions conirolled by ibe IF condidon axe 
inserted. NoEc diac anows with double lines in Fig. 8 denote conroctions fbr all variables in IN- and die 
shaded DEMUX function stands fbr several DEMUX fuoctions, one for each variable in IN. The DB- 
MUX fuoctions*forward ^atz packets only to (be selected branch. New lists YABlrlSTl and VARL1ST2 
are compUcd ^A^ith die respective ompxics of diese DEMUX funcoona. Th'e dien-br^ch Is processed with 
and tb^ efao branch vdih VARL1ST2. Rnally, the ouipui values are combined. OUT con* 
tains die ncnv values for die same variables as in IN. Since only one branch is ever actcvaied there will not 
be a conflict due to two packets aniving sisndtanuou^iiy. The combinations v^ill be added to VARLI5T 
after the condidona] sialcmeni. If d;c IP exccncion shall be pipdined, MERGE opcodes for the oatpux 
must be inserted, too. They aie coinroTled by die condidon like die DEMUX functions. 

The following exieosSon vndi respect to [4] is added (dotted lines in Fir. 8) in order to control the eseeu- 
tion as mentioned above wJdi START events: The START input 5s ECOMB-combined widi the conditron 
output and connected to the SEL input of the DEMUX f unciiona. 'ITie START mpui5 of ihcnbody and 
elsebody are generated ftom die ECOMB ouq>ut sen through a l-FEJTER and a 0-CONSTANT*' or 
ihrougb a 0-FJLTER, respecdvely. The overall STAJET^cn; output is generated by a simple 'rz to 1 
connecdm* of dienbody's and elsebody's ffTARTnev outputs. Widi dus wietisioo, arbitrarily nested 
. condidona] siaiements or loops can be handled within thenbody md elsetedy. 



^LZA WHILE Loops 

WHILE loops are processed similarly to the scheme presented in [4], cf. Fig. 9. As in SecUoa 4.23, dou- 
ble line connections and shaded MERGE and DEMUX fiincdons represent dupKcadon for all variables 
in IN. Here IN - use{iijhUcbady) U d^flwhUebody) U wc(/teadcr). The WHILE loop executes as 
follws: In the fijsc loop iiemxion, die MERGE functions selea aO input values from VARLI5T at loop 
enny (SEL=<>). The MERGE outputs are connected to die header and the DEMUX functions. Tf dia 
while condition is true (SELslX die input values arc forwarded the whilebody. edierwise to OUT. 
The ouq^ut values of the while body ace fed back to whilebody's input via die MERGE and DEMUX 
operators as long as the condidon is true. Finally* after the last iceradon, they are fonvarded to OUT* The 
ontpucs arc added to die new VARLISX^^ 

Two extensions with respect to [4] are added (doited lines in Fn: 9): 

^^A vaiiabl^ is ttscd in a snicoicnc (asd bcncc in ^ progran itgion conninihg this siaieainu) if it9 value is read. A ^f«ri«ble 
is defined m a sutcmcni (or i&gzcm) if a dcw v^uc is assignpd to it. 
' ' Ttve 0-CONSlANT is xcs{uin:d since STAKT events oxust always be t^^evextts. 

*-Noie Una MERGE nmction for Variables not five ax ihe beginning and ihe wh'ilebody^ be^nnina can be removed 
since iis aufinn ts not used. Fbr iti«se variables, only the DHMtpc funciioo lo oucpui ihe Gnal ^alue is crqtiircd. Also note Max. 
die MERGE fimedons cain be replaced by simple '"2 to ] c^anecuoas'* if the coniigui^uoa pnuesS ^xSSssSlB^ that pactete from 
l^^ri almys anivc at input fecciback -vatocs 
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- In {4]. the SEL input of the MERGE funcUons fa preloaded with 0. Hence the loop etccufion 
befius anmediaidy and can be executed only once. Instead, we connect the START input to the 
MERGE's SEL Input ("2 to 1 connection-' with the header ouipui). This allt»ws lo control the lini^ 
of die start of the loop execution and (o losiait II 

• TTe whilebody's START input is conoecied to the header ouqwi, sent (hraush a l-FZLTER/O- 
CONS-puvTT ooinbhiaiiDo as above (generates a O^venr for each loop iietation). By ECOMB- 
combHiing whilebody's STABT„ea, ompw widi die header outpm for the MERGE functions' 
SEL mputs, die aexi loop itcraUon is only Planed after die pievions one has fioi9hed. ibc whUe 
loop's START^ output is generated by filceiiag die header ooqwc for a O-event 

Widi dtese esueonons, axfatraiily nested conditional statemems or Ioop5.can be handled widiia while- 
body. " 

AOJS FORLoops 

FOR loops axej^eulariy tegular WHILE loops. Tlienrfbre we could handle tbeoi as explained above. 

""^Jo ^^"^ ^ ^ ""^ P«*« nwWpEcadon fone^ • 

non MDAIA which «n be used for a more efiidenc implememadon of FOR lootps. TWs new FOR loop 

ATOR Joj is contwlted by a counter CNT. The lower bound (LBJ. upper bound (UB), and iocrenuHH 
^S^^n^JT^""''^ «q»«si«ws (see Sections 4J2.1 and 4i7) and coanecied 

^*r^*^^'vT'°^**"P^"'^^^^^'*'nWnadon isonly^^^^ for variables in = 
rro ^/'^ ^ forbedy." INI does not contain variables which an: only w«f 
W fofbody, LB. UB. or INC. and does also not cpnuun die loop index variable. Yiniables in INI aie 

CWTs W outpuL CThe W output does dtt invetse of a WHILE loop's header ouLt; it outwtea t 
^"^^T^ """^ '^^•'^^ ^'^^ of thel^RGHfancdS^^d JT^ 

OM-Ts X owputp^ldes die.eunem value of the loop indei v«iable. 3f die fioalimb* value is required 
(which produces one event for every foop iieradon). oy i s w event ou^t 

r^^'r^^Jh" «f e(/cr*<^) \ «te/(/or6p.Zj,), i. e. diose defined ouiside die loop and only used 
oi^!^ ™S ^ '^"^^ » ioop itenuion since ic is consumed in each 
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The rollowipg concnsl events (dooed lines id Fig. 10) are sinnlar to tbe WHILE loop exCCTSipns, bui 
sampler. CNT's START input is connected to the loop^s overaQ START^gnal. STABTnviD b generaied 
fiom CNTs W outp^ seal tihrough a UFILTER and (WIONSTANT, CKTs v ouipist produces one O- 
event for each loop iteration and is therefoie used as fbrbody's START. Hnally. CNTs NEXT inpuc Is 
connecied to foitody's 5TARr«cv omput- 

{^pipelined b»ps (as defined below in Secdon 4.2.6), loop iierafions are allowed To overlap. Theiefoie 
CNT^s NBCT input needs not be connected. Now die countar pioduc<^ indc9; variable values and control 
evencs as fiasc as ch^ can be consumed. However, in this case CNTs W output in not sufficient as overall 
STARTncw ^^^^^ since the counter tenniD;»tei before ibe last iioaiion's forbody Snishes. Instead, 
STABTrm »s generated ftom CNT*3 U output ECOMB-combincd with forbody's STAJXTneu} output, 
sent through a V-FnjBR/0«CONSTANTcombinaiion. The ECOMB produces an event after terminadon 
of each loop iieration, but only the tor event is a l-event because only thtf last ouipui of CNTs M ou^ut 
is a 1 -event Hence this event i ndicaics that the last iteraiicn has finished. Cf. Secdun A3 for a FOR loop 
example compUadon with and without pipelining. 

As for WHILE loops, these methods aHow to process arbitrarily nested loops and conditional statements. 
The foUowuxg advanmges over WHILE loop implemeniadons are achieved: 

• One index variable value is generated by the CNT Aancdon each dodc cycle. This is feter and 
smaller tban die WHILE loop unplemeniadon which allocates a MERGE/DEMUX/ADD loop and 
a comparator for die counter functionality. 

» Vaiiables in IN2 (only used in forbody) are r^mduced in the special MD^^A functions and need 
not go through a MERGE^EMUX loop. This is again faster and smaller than the WH]I£ loop 
impl ementaiion. 

42*6 Vecfiorizatioa and Fipelining 

The mediod described so far generates CDFQs performing the HLL program's funcdonallcy on an RDFR 
However, the piogram execution is unduly sequentialized by .the START signals. In sc^ne cases, iimer- 
most loops can be vemrizptL This means that loop iteradons can overlap* leadiog to a pipelined dataflow 
through the operators of xhe loop body. The Fipetim Vcmrimkm tedhnique [6] can be easily applied to 
the compOadon method presented here. As menuoned above, for FOR loops, tbe CNTs NEXT input is 
removed so that CNT counts continuously, thereby overlapping the loop tnsradons. 

All loops without array accesses can be pipelined since tbe daufiow automatically syndvoniaes £ciop* 
canted depmience:^^ i. e. d^Ttaidcnces between a statement in one iteration and another statement in a 
subsequent iteration. Loops widi array accesses can be pipelmed if tbe array (i, e. RAM) accuses do 
not cause loop^carried dependences or can be transformed lo such a form. In this case no RAM address 
is written in one and read in a subsequent iteration. Therefore the read and write accesses lo the same 
RAM msay overlap. This degree of freedom is exploited in die RAM access technique described below. - 
Especially for dual-ported RAM it leads to consideiable perfbrmanoe ImpttTvements. 

4.2.7 Array Accesses 



In contrast to scalar variables^ array accesses have to be controlled enpliciUy in order to maintain the 
program*s coxiect execution order. As opposed to noimal dataflow machine models [3]^ a RDFP does 



17 



n« have a single address space. Instead, ibe anays are allocated to several RAMs. Tliis leads to a 
diffcreni apfnoach lo handling RAM accesses and opens up oew oppanmities for opdnization. 
TP reduce liie cpmpIcKiqr of the wmpilaiion process, anay accesses are processed in two phases. Phase 
I uses -liseado-fimcaoB?- far RAM read and wiiie accesses. A RAM lead funcaon has a RD data iopui 
(read address) aod an OUT data ompm fread value), and a RAM ^te ftaietion has WR aad lN ixA 
inputs (wnie address and wriifi value). Both flincu'ons are labeled with ifae array tbe access lefeis tq, and 
both have a START event input and a U &feni output. The events ooniral the access older. In Phaw *2 aU 
accesses to die same RaM are combined smd stthsUtnted by a single RAM fiiiicdon as shown in Rg 1 
This inv^rtyes manipulating the daa and event inputs and outputs such that (be cqirci execudon Oflfcr is 
niaiotained and ilw onjtputs are for*«rded to the coraeci part of die C3DFO. 

raasel Since anays are dtocaied to sevwa]RAM5.ool^ 

Aromzed. Accesses to differeot RAMs can occur concuntajriy or even out of oider In ease of (tela 
dqsndenoes, the accesses $eIf.synchronize antotnatically, Within pipelined loops, not even read and 
^>!l^f*^ » ti» same RAM have to be synchronized. This is achieved by maintaining separate 
START Signals for eveiy RAM or even separate START signals for RAM read and RAM write accesses 
to pjpdtaed Ioo;« At *e end of abasic Week (!]'*. aU STABT^ outputs nK«H« combined by a 
KOMB to provide a START agnal for the next basic block which guarantees that all anay accesses in 
aw previous haste blw* are completed. For pipelined loops. dUs conditfon can even be rehoed. Onfer 
a^^die loop eat afl accesses have lobe contpteted. Iheimfividual loop iiaadons need not be synchr<i 

First die !UM addresses are oompated. The compiler froniend's standard transfbnnasion for anay ao- 
ce5S<s can be used, and a CDFG function's output is -^merated which pitmdes the address. If applicable, 
fte ofliset with respect to the RDFP RAM (as determined in the initial mapping phase) must be addc± 
H«s oaq^ a connected to the pseudo RAM read's RD input (fbr a read access) or to the pseudo RAM 
7^ ^J^l^-' "^i* Additionally, the OUT output 0«d) pr IN input (wrie) is con- 

necttATlB START inj»s is connected to the variable's START signal, and the U output is used as 
STAKTnog ibr the next 



^ avoid redundant read accesses. RAM reads are also lettered in VaRLIST. Instead of an integer 
in an array mdex nivalidaies the iofonnaiion in VARUST. It must dien be lemoved fiom it. 
The^owing«a^le widi two lead accesses com^ 

Sr^^ * J should be siibsiitated by the actual output, nauiiingfiom the program bef<^ 

* » atil; 
y « atjl; 
s » X ^- y; 

Kg. 13 shows the Uansiaiion of the fiaiowing wiile access; 
atll x; 

tLbaaebbdca* pn^gnm part v>icti Wangle enuy wi a single e»t poiat. a pi«c of and^-rtnc code. 
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Phase Z We qot^ meige ibc pseudo-fisiscdoiis of all accesses to the same RA^4 and substitute ttiem by 
a jingle RAM function. For aJ 1 data itipucs (BP for read access and WR and IN for write asxessX GAIEs 
aje xn^ened becvtreen tiie input and die RAM function. Their E ii^ts are connected to the respeoiye 
STAPT iigyuts of the original pseudo-fiinctions. a RAM is read and written ai only one program point, 
the U oixipnt of the read and wtite access is moved to the ERD or EWR output, iisspcctively. For Gtample, 
die single access a Fil « ac; from Fig. 13 is transfotmed to ifae final CDFG 5hown in Fig, 5. • 

However, if several lead or several write accesses (i. e. pseudo-functitms front different program pwits) 
to the same RAM ocean die ERP or EWR events are not spedSc anym^e. Bui a STAJRT^^cw eveox oF 
the original pseudo functioo should only be generated for dse respecdve program point* i. e. for the cun* 
ma flcctfSJ. This is achieved by eqnnccdng thd START signals of all cfher accuse? (pseado*functioos} 
of die same type Oread or write) with die imeru^ START signal of die current aocessw The result- 
ing signal produce,^ an event for every adcess, but only for the current access a 1-event. This event is 
ECOMB-combined with the RAM's ERD or EWR ompuc The ECOMB's output will only occur after 
die access is completed. Because ECOMB OR-combines its event pactets, only the current access pio^ 
duces a l-event. Nbct, this event is filtered widi a I-FILTER and chan^ by a 0-CON51ANT, resulting 
in a STAKTnm signsd which produces a 0-eveni only after the current access is completed as required. 

For several accesses; several sources are connecced id the RD, and IN inputs of a RAM. This disables 
the self-synchronization. However, since only one access occurs at a dme, die CAIEs only allow one 
data packet to arrive at the inputs* 

For read accesses, die packets at the OUT output f!ace die same problem as the £RD event packecs: 
They occur for every read access, but must only be used (and forwarded co subsequent operators) for 
d)e current access. This can be achieved by connecting die OUT output via a DEMUX function. The Y 
ooipui of the PEMUXis used, and die X output is left unconnected. Then it acts as a selective gate which 
only forwards packets if its SEL input receives a I-event, and discards its data input if SEL receives a 
0-eveot. The signal created by Uie ECOMB described above for die START-n.^ signal creates a 1 -event 
for die eurrent access, and a 0-event odierwise. Using it as die SEL bipui achieves exactly the desired 
fkincdonaUty. 

Fig- 4 shows the resulting CDFG for the first example above (two read accesses), after applying die 
cansfoimaiions of Phase 2 to Fig. 12. STOPL is now generated, as ToUws: STAi!T(old) is inverted. 
"2 so 1 connected" to STOPl (because it is the START input of die second- read pseudo-1iinccicn)» 
ECOMB-combined with RAM's ERD output and sent through the l-FILTER/O-CONSTANT combina- 
tion. STABT(new)is generated sbnnarly« but here SlART(oid} is dizecdy used and STOFI iuvened. The 
GATES fbr input IN 0 and j) are connected to STABTCold) and STOFI. respectively, and die DEMUX 
funcdoDs for outputs x and y are connecied to the ECOMB outputs related to STOPl and STAKT(|iew), 

Multiple write accesses use die same conirol events, but instead of one GATE per access for die RD 
inputs* one GATE for WR and oae gate for IN (with the same E input) arc u5<d. The EWR output is ' 
processed like die ERD output for read accesses. 

This transfbimadon ensures that qII RAM accesses are executed coirecdy, but it is not very fast since md 
or write accesses to die same RAM are not pipelined. The next access only starts after the previous one 
is completed, even if the RAM bei^g used has several pipeline stages. This inefficiency can be removed 
asfollws: 

First contiiiuoos sequences ofeiUicr read accesses or write accesses (not mi^) within a basic block are 
detected by cheddug for pseudo-funcdons whose U output is directly conzjuecced to the START input of 
anodier pseudo-^cdon of the same RAM and the same type (read or write). For these sequeoccss it is 
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possible CO scream data into the RAM nither iban waiting for the previous access to coxnplcu:; For ihis 
purpose^ a combinatioa of MERGE fwcuons selects the RD or WR ond IN .mpuc Id the order given 
by the sequence- The MERGES nuist be coDiroUed hy Uerative ESEQs guarante^g that tiie inpuis aie 
only forwartcd In the desired order. Then only the first access in the sequence needs lo be contrrjllcd by 
a GATE orGAZEs. SimUarly. the OUT oiupuis of a read access can be distributed more effideattly for 
a seqaence* A comttoafion of DEMUX functions widi the same ESEQ cQntrol cm be useiL I£ is most 
efiiciem lo anaoge the MERGE asd DEMUX fimcfions as balanced binaiy irees. 

The STARTncw ^gnal is gencraied bs follows: For a sequence of len^ ibc START dgnal of die 
cndfc sequence fa replicated n times by an ESEQ[00,.I] funcdon with die START input connect to 
the sequence's START te output is dBrecdy to I connected" widi the other accesses' START signal 
(for single accesses) or ESEQ output ^enl through 0-CQNSTANT (for access sequences), ECOMB- 
connected to EWR or ERD, respecdvely, and sent through a 1-FILTEK/0-C0N5TANT combination, 
slmiJor to the basic method described abowcL Since onJy die last ESEQ ouipui is a l-eveni* only the 
last RAM access gcoerates a STASTnew as required. Aluamadvdy, for read accesses, the gen^cm 
of die last output can be sent ihxough a GATE (widioiR the E inpul comuscced), diereby pnxfucing a 
STABTnem cvent- 

Rg. 14 shows the optimized version of die fiisi example (Figures 12 and 4) using the £S£(2-niethod for 
generating START md Fig* 6 ^ws tihe final CDFG of die fpUowing. larger example with Oaee 
amy reads, fiete die lacier mediod forproducing the STABTncui event is used- 

z ^ a[Kj; 

If several read sequences or read sequences and single read acisesscs occur for dtc same RAM^ l-evean 
for dececdng the curmti accesses nmX be generated for sequences of read accesses. They are needed 
m separate die OUT-valttcs relating to separate sequences. The ESEQ output just defined, sent through 
a l^NSTANT achieves diis. It is again "N to I connected" to the oiher acccssts' START signals 
(for Single accesses) or ESEQ ouq^uls sent duough O-CONSTANT (for access sequences). The resulting 
event is used lo ccmircl a firsMiage DEMUX which is insened to select the relevant QUT ouipttc data 
packets of Uw sequence as described above for the basic meOiod. Rcferiodie secqnd oumiplc (Fiams 
IS and 16) in Secdou 43 fcr a complete eumplc 

Input and Oa^ut Forts 

Input and ompui ports are processed similar to vector accesses. A read fioni an input port is like an 
airajf read without an address. The input data packet Is sent to DEMUX functions which send it to die 
collect subsequent operators. The STOP signal is generated in the same way as described above for 
RAM accesses by combteing the INPOKTs U output with the current and odier START signals. 

Output ports control die data packets by CATEs like arr^ write accesses. TTie STOP signal is also ' 
created as for RAM accesses. 
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43 More Examples 



7 show$ d>e genomed CDFG for ttie fbncmrxqg for loop. 



a a b + c< ^ 
for (i»0; i<-10; i++) { 
a = a + i; 

In this waraplej.///! = {a} and I/V2 = {fc} (rf, Rg. 10). The MERGE function for variable a is 
replaced by a zTl data conncciion as mcniioned in die foocnoic of SccUon 4.2.S. Noce ihat only one 
data packtt arrives far variables b, c and Ic, and one final pactet Is produced for a (qui), forbody does 
noi ose a START eveni since bodi operazions (die adder and the RAM 'write) are dataflow-controlled 
by (he counter an)r«ay* But the RAM's EWR output is d&e foxbody's STARTnsvf ao^ connected to 
CNT's NEXT input. Note cbac die pipeKning opdmizauon, cf. Secdon 4.2.6, was not applied here. If it 
is applied (which is possible Tor this loop), CNTs NEXT input is not connected, cf. Rg. 11. Here; the 
loop iterations overlap. STARTruLvf « gCTdated from CNTs U Output and Forbody's STAftTnew O-c. 
RAM's EWR output}, as defined at the end of 5ecdon 4.2.5. 

The roQowing program contains a veciori^ble (pipelined) ItHTp with one ^vrite access tp array (RAM) x 
and a sequence of two read accesses id axny (RAt/Q y. After die loop, another single read access lo y 
ocraus. 

Z = 0; 

for (i==0; i<^10; 1++) { 
a » z + y(i] + yt2*ij; 



a a y(kl ? 

Pig. 15 shows die inienncdiaie CDFG generated before the array access Fhws'Z iransformadon is ap- 
plied. The pipelined loop is controlled as follows: Widsin the loop, separate START signals for ^te 
accesses lo x and read accesses to y are used. The reentry to die fprbody is also controlled by two in- 
dependent signals C^cydel" and "^^eZ**)* For the read accesses, "^deZ"* goaraniees that the read y 
accesses occur in the cocrect orden Bat die beginning of an itcraciQn for read y and write ^ accesses is 
not synchronized. Only at loop e;tii all accesses must be Bnisbed, which is guaranteed by signal *'Ioop 
finished"^. The single read access is completely independem of th^ 

Fig, 16 shows die final C35FG after Phase 2. Note diat ^^del" is removed since a single write access 
needs no addidonai coniroly and ^^vZ" is lenioved since the imezted MERGE and DEMUX An^cdons 
automatically guarantee the correct execution order. The read y accesses are not independent anymore 
since they all refer to the same RAM, and die funcdons have been meiged. ESEQs have been allocated 
to control the MERGE and DEMUX functions of die read sequence, and for the first^stage DEMUX 
funcuons which separate dae read OUT vdues for die read sequence and for the 6nal single'iead access. . 
The ECOMBs, i-FILTERs, O-CONSTANTs and 1-CONSTANTs are allocaied as described in Section 
4.2,7. Phase Z. to generaie correct control events for the GaTEs and DEMUX funcdons. 
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Titl9: A method -for compiling high-level language 

programs to a reconfigurable data-flow proces- 
15 sor 



Claims 

20 1. A method for providing configurations for a multidimen- 
sional array of coarse-grained and/or fine-grained arith- 
metic and/or logic cells according to a high-level- 
language comprising FOR-Loops^ 
wherein 

25 a counter (4-2,5) is implemented in said array when a 

loop is to be configured into said array. 
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This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record. 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 
CJfADED text OR DRAWING 

BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) or EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



